Search Result

Select

Task scheduling algorithm based on weight in Storm

LU Liang, YU Jiong, BIAN Chen, YING Changtian, SHI Kangli, PU Yonglin

Journal of Computer Applications 2018, 38 (3): 699-706. DOI: 10.11772/j.issn.1001-9081.2017082125

Abstract （560）

PDF （1385KB）（584）

Save

Apache Storm, a typical platform for big data stream computing, uses a round-robin scheduling algorithm as the default scheduler, which does not consider the fact that differences of computational and communication cost are ubiquitous among different tasks and different data streams in one topology. Hence optimization is needed in terms of load balance and communication cost. To solve this problem, a Task Scheduling Algorithm based on Weight in Storm (TSAW-Storm) was proposed. In the algorithm, CPU occupation was taken as the weight of a task in a specific topology, and similarly tuple rate between a pair of tasks was taken as the weight of a data stream. Then tasks were assigned to the most suitable work node gradually by maximizing the gain of weight of data streams via transforming inter-node data streams into intra-node ones as many as possible with load balance ensured in order to reduce network overhead. Experimental results show that TSAW-Storm can reduce latency and inter-node tuple rate by about 30.0% and 32.9% respectively, and standard deviation of CPU load of work nodes is only 25.8% when compared to Storm default scheduling algorithm in WordCount benchmark with 8 work nodes. Additionally, online scheduler is deployed in contrast experiment. Experimental results show that TSAW-Storm can reduce latency, inter-node tuple rate and standard deviation of CPU load by about 7.76%, 11.8% and 5.93% respectively, which needs only a bit of executive overhead compared to online scheduler. Therefore, the proposed algorithm can reduce communication cost as well as improve load balance effectively, which makes a great contribution to the efficient operation of Apache Storm.

Reference | Related Articles | Metrics

Select

Spatio-temporal query algorithm based on Hilbert-R tree hierarchical index

HOU Haiyao, QIAN Yurong, YING Changtian, ZHANG Han, LU Xueyuan, ZHAO Yi

Journal of Computer Applications 2018, 38 (10): 2869-2874. DOI: 10.11772/j.issn.1001-9081.2018040749

Abstract （1023）

PDF （993KB）（334）

Save

Aiming at the problem of multi-path query in tree-spatial index and not considering temporal index, A Hilbert-R tree index construction scheme combining time and clustering results was proposed. Firstly, according to the periodicity of data collection, the spatial-temporal dataset was divided, and on this basis, a time index was established. The spatial data was partitioned and encoded by the Hilbert curve, and the spatial coordinates were mapped to one-dimensional intervals. Secondly, according to the distribution of the feature object in space, a clustering algorithm using dynamic determination of K value was adopted, to build an efficient Hilbert-R tree spatial index. Finally, based on several common key-value data structures of Redis, the hierarchical indexing mechanism of time attributes and clustering results was built. Compared with the Cache Conscious R+tree (CCR+), the proposed algorithm can effectively reduce the time overhead, and the query time is shortened by about 25% on average in the experiment of spatial-temporal range and target vector object query. It has good adaptability to different intensive data and can better support Redis for massive spatio-temporal data queries.

Reference | Related Articles | Metrics

Select

Partitioning and mapping algorithm for in-memory computing framework based on iterative filling

BIAN Chen, YU Jiong, XIU Weirong, YING Changtian, QIAN Yurong

Journal of Computer Applications 2017, 37 (3): 647-653. DOI: 10.11772/j.issn.1001-9081.2017.03.647

Abstract （446）

PDF （1133KB）（382）

Save

Focusing on the issue that the only one Hash/Range partitioning strategy in Spark usually results in unbalanced data load at Reduce phase and increases job duration sharply, an Iterative Filling data Partitioning and Mapping algorithm (IFPM) which include several innovative approaches was proposed. First of all, according to the analysis of job execute scheme of Spark, the job efficiency model and partition mapping model were established, the definitions of job execute timespan and allocation incline degree were given. Moreover, the Extendible Partitioning Algorithm (EPA) and Iterative Mapping Algorithm (IMA) were proposed, which reserved partial data into extend region by one-to-many partition function at Map phase. Data in extended region would be mapped by extra iterative allocation until the approximate data distribution was obtained, and the adaptive mapping function was executed by awareness of calculated data size at Reduce phase to revise the unbalanced data load in original region allocation. Experimental results demonstrate that for any distribution of the data, IFPM promotes the rationality of data load allocation from Map phase to Reduce phase and optimize the job efficiency of in-memory computing framework.

Reference | Related Articles | Metrics

Select

Parallel access strategy for big data objects based on RAMCloud

CHU Zheng, YU Jiong, LU Liang, YING Changtian, BIAN Chen, WANG Yuefei

Journal of Computer Applications 2016, 36 (6): 1526-1532. DOI: 10.11772/j.issn.1001-9081.2016.06.1526

Abstract （550）

PDF （1195KB）（396）

Save

RAMCloud only supports the small object storage which is not larger than 1 MB. When the object which is larger than 1 MB needs to be stored in the RAMCloud cluster, it will be constrained by the object's size. So the big data objects can not be stored in the RAMCloud cluster. In order to resolve the storage limitation problem in RAMCloud, a parallel access strategy for big data objects based on RAMCloud was proposed. Firstly, the big data object was divided into several small data objects within 1 MB. Then the data summary was created in the client. The small data objects which were divided in the client were stored in RAMCloud cluster by the parallel access strategy. On the stage of reading, the data summary was firstly read, and then the small data objects were read in parallel from the RAMCloud cluster according to the data summary. Then the small data objects were merged into the big data object. The experimental results show that, the storage time of the proposed parallel access strategy for big data objects can reach 16 to 18 μs and the reading time can reach 6 to 7 μs without destroying the architecture of RAMCloud cluster. Under the InfiniBand network framework, the speedup of the proposed paralled strategy almost increases linearly, which can make the big data objects access rapidly and efficiently in microsecond level just like small data objects.

Reference | Related Articles | Metrics

Select

Energy-efficient strategy of distributed file system based on data block clustering storage

WANG Zhengying, YU Jiong, YING Changtian, LU Liang

Journal of Computer Applications 2015, 35 (2): 378-382. DOI: 10.11772/j.issn.1001-9081.2015.02.0378

Abstract （468）

PDF （766KB）（384）

Save

Concerning the low server utilization and complicated energy management caused by block random placement strategy in distributed file systems, the vector of the visiting feature on data block was built to depict the behavior of the random block accessing. K-means algorithm was adopted to do the clustering calculation according to the calculation result, then the datanodes were divided into multiple regions to store different cluster data blocks. The data blocks were dynamic reconfigured according to the clustering calculation results when the system load is low. The unnecessary datanodes could sleep to reduce the energy consumption. The flexible set of distance parameters between clusters made the strategy be suitable for different scenarios that has different requests for the energy consumption and utilization. Compared with hot-cold zoning strategies, the mathematical analysis and experimental results prove that the proposed method has a higher energy saving efficiency, the energy consumption reduces by 35% to 38%.

Reference | Related Articles | Metrics

Select

Data migration model based on RAMCloud hierarchical storage architecture

GUO Gang, YU Jiong, LU Liang, YING Changtian, YIN Lutong

Journal of Computer Applications 2015, 35 (12): 3392-3397. DOI: 10.11772/j.issn.1001-9081.2015.12.3392

Abstract （466）

PDF （878KB）（352）

Save

In order to achieve the efficient storage and access to the huge amounts of data online, under the hierarchical storage architecture of memory cloud, a model of Migration Model based on Data Significance (MMDS) was proposed. Firstly, the importance of data itself was calculated based on factors of the size of the data itself, the importance of time, the total amount of user access, and so on. Secondly, the potential value of the data was evaluated by adopting users' similarity and the importance ranking of the PageRank algorithm in the recommendation system. The importance of the data was determined by the importance of data itself and its potential value together. Then, data migration mechanism was designed based on the importance of data, The experimental results show that, the proposed model can identify the importance of the data and place the data in a hierarchical way and improved the data access hit rate from the storage system compared with the algorithms of Least Recently Used (LRU), Least Frequently Used (LFU), Migration Strategy based on Data Value (MSDV). The proposed model can alleviate the part pressure of storage and has improved the data access performance.

Reference | Related Articles | Metrics

Select

Video recommendation algorithm fusing comment analysis and latent factor model

YIN Lutong, YU Jiong, LU Liang, YING Changtian, GUO Gang

Journal of Computer Applications 2015, 35 (11): 3247-3251. DOI: 10.11772/j.issn.1001-9081.2015.11.3247

Abstract （438）

PDF （790KB）（564）

Save

Video recommender is still confronted with many challenges such as lack of meta-data of online videos, and also it's difficult to abstract features on multi-media data directly. Therefore an Video Recommendation algorithm Fusing Comment analysis and Latent factor model (VRFCL) was proposed. Starting with video comments, it firstly analyzed the sentiment orientation of user comments on multiple videos, and resulted with some numeric values representing user's attitude towards corresponding video. Then it constructed a virtual rating matrix based on numeric values calculated before, which made up for data sparsity to some extent. Taking diversity and high dimensionality features of online video into consideration, in order to dig deeper about user's latent interest into online videos, it adapted Latent Factor Model (LFM) to categorize online videos. LFM enables us to add latent category feature to the basis of traditional recommendation system which comprised of dual user-item relationship. A series of experiments on YouTube review data were carried to prove that VRFCL algorithm achieves great effectiveness.

Reference | Related Articles | Metrics

Select

Energy-efficient strategy for disks in RAMCloud

LU Liang YU Jiong YING Changtian WANG Zhengying LIU Jiankuang

Journal of Computer Applications 2014, 34 (9): 2518-2522. DOI: 10.11772/j.issn.1001-9081.2014.09.2518

Abstract （168）

PDF （777KB）（356）

Save

The emergence of RAMCloud has improved user experience of Online Data-Intensive (OLDI) applications. However, its energy consumption is higher than traditional cloud data centers. An energy-efficient strategy for disks under this architecture was put forward to solve this problem. Firstly, the fitness function and roulette wheel selection which belong to genetic algorithm were introduced to choose those energy-saving disks to implement persistent data backup; secondly, reasonable buffer size was needed to extend average continuous idle time of disks, so that some of them could be put into standby during their idle time. The simulation experimental results show that the proposed strategy can effectively save energy by about 12.69% in a given RAMCloud system with 50 servers. The buffer size has double impacts on energy-saving effect and data availability, which must be weighed.

Reference | Related Articles | Metrics

Select

Energy-efficient strategy for dynamic management of cloud storage replica based on user visiting characteristic

WANG Zhengying YU Jiong YING Changtian LU Liang BAN Aiqin

Journal of Computer Applications 2014, 34 (8): 2256-2259. DOI: 10.11772/j.issn.1001-9081.2014.08.2256

Abstract （314）

PDF （793KB）（504）

Save

For low server utilization and serious energy consumption waste problems in cloud computing environment, an energy-efficient strategy for dynamic management of cloud storage replica based on user visiting characteristic was put forward. Through transforming the study of the user visiting characteristics into calculating the visiting temperature of Block, DataNode actively applied for sleeping so as to achieve the goal of energy saving according to the global visiting temperature.The dormant application and dormancy verifying algorithm was given in detail, and the strategy concerning how to deal with the visit during DataNode dormancy was described explicitly. The experimental results show that after adopting this strategy, 29%-42% DataNode can sleep, energy consumption reduces by 31%, and server response time is well. The performance analysis show that the proposed strategy can effectively reduce the energy consumption while guaranteeing the data availability.

Reference | Related Articles | Metrics

Select

Optimal storing strategy based on small files in RAMCloud

YING Changtian YU Jiong LU Liang LIU Jiankuang

Journal of Computer Applications 2014, 34 (11): 3104-3108. DOI: 10.11772/j.issn.1001-9081.2014.11.3104

Abstract （282）

PDF （782KB）（563）

Save

RAMCloud stores data using log segment structure. When large amount of small files store in RAMCloud, each small file occupies a whole segment, so it may leads to much fragments inside the segments and low memory utilization. In order to solve the small file problem, a strategy based on file classification was proposed to optimize the storage of small files. Firstly, small files were classified into three categories including structural related, logical related and independent files. Before uploading, merging algorithm and grouping algorithm were used to deal with these files respectively. The experiment demonstrates that compared with non-optimized RAMCloud, the proposed strategy can improve memory utilization.

Reference | Related Articles | Metrics

Select

Energy efficient scheduling for multiple directed acyclic graph in cloud computing

LIU Danqi YU Jiong Ying Changtian

Journal of Computer Applications 2013, 33 (09): 2410-2415. DOI: 10.11772/j.issn.1001-9081.2013.09.2428

Abstract （760）

PDF （846KB）（511）

Save

Energy-efficient scheduling algorithms based on multiple Directed Acyclic Graph (DAG) fail to save energy efficiently, have a narrow application scope and cannot take performance optimization into account. In order to solve these problems, Multiple Relation Energy Optimizing (MREO) was proposed for multiple DAG workflows. MREO integrated independent tasks to reduce the number of processors used, on the basis of analyzing the characteristics of computation-intensive and communication-intensive tasks. Backtracking and branch-and-bound algorithm were employed to select the best integration path dynamically and reduce the complexity of the algorithm at the same time. The experimental results demonstrate that MREO can reduce the computation and communication energy cost efficiently and get a good energy saving effect on the premise of guaranteeing the performance of multiple DAG workflows.